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Abstract 

The paper analyzes the scalability of multiobjective estimation of distribution algorithms 
(MOEDAs) on a class of boundedly-difficult additively-separable multiobjective optimization 
problems. The paper illustrates that even if the linkage is correctly identified, massive mul- 
timodality of the search problems can easily overwhelm the nicher and lead to exponential 
scale-up. Facetwise models are subsequently used to propose a growth rate of the number 
of differing substructures between the two objectives to avoid the niching method from being 
overwhelmed and lead to polynomial scalability of MOEDAs. 



1 Introduction 



One of the challenging areas in genetic and evolutionary computation that has received 
increased attention is multiobjective evolutionary algorithms (MOEAs). Several MOEAs 
have been proposed and applied with significant success to real-world problems ( Deb, 2 001 
Coello Coello, Van Veldhuizen, & Lamont, 2002). However, studies on the theory and analysis of 
MOEAs have been limited in part because of the complexity of both the algorithms and the prob- 
lems. For example, some aspects of problem difficulty and algorithm scalability have been recently 
studied flDeb, 19991 [Chen, 2004| >. 

Recently, there is a growing interest in extending estimation of distribution algorithms 
( Pelikan, Lobo, Hi Goldberg, 20021 Larrahaga Lozano, 2002 ) — a class of competent genetic al- 
gorithms (Goldberg, 19991) that replace traditional variation operators of genetic algorithms (GAs) 
with probabilistic model building of promising solutions and sampling the model to generate 
new offspring — to solve multiobjective search problems quickly, reliably, and accurately. Such 
multiobjective EDAs (MOEDAs) QBosman fc Thierens, 2002b"l |Khan, Goldberg, fc Pelikan, 2002| 
Qcenasek, 2002| Ahn, 20051 ) typically combine the model-building and sampling procedures of 
EDAs with the selection procedure of MOEAs such as the non-dominated sorting GA (NSGA- 
II) ( |Deb, Pratap, Agrawal, Hi Meyarivan, 207)2" ) , and a niching method such as sharing or crowding 
in objective space. MOEDAs have been shown to significantly outperform traditional MOEAs in 
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efficiently searching and maintaining Pareto-optimal solutions with high probability on boundedly- 
difficult problems. 

However, the scalability of the population size and the number of function evaluations required 
by EDAs as a function of problem size and the number of Pareto-optimal solutions has been largely 
ignored. This is the case even though one of the primary motives for designing MOEDAs is to 
carry over the polynomial (oftentimes sub-quadratic) scalability of EDAs to boundedly-difhculty 
multiobjective search problems. Therefore, we investigate the scalability of EDAs — specifically mul- 
tiobjective extended compact GA (meCGA) and multiobjective Bayesian optimization algorithm 
(mBOA) QKhan, Goldberg, Pelikan, 2002D — on a class of boundedly-difficult additively-separable 
problems. We demonstrate the even if the sub-structures (or linkages) are correctly identified, mas- 
sive multimodality of search problems can overwhelm the niching capability and lead to exponential 
scalability. Using facetwise models we predict the 

The paper is organized as follows. We provide a brief background on the MOEAs, particularly 
NSGA-II, and MOEDAs, specifically, meCGA. The details on the test problems and experimental 
methodologies are described in section El Section 0] presents the scalability results of MOEDAs 
on a class of boundedly-difficult additively-decomposable multiobjective problems. Subsequently, 
we demonstrate how massive multimodality of the search problem can overwhelm the niching 
mechanism and lead to exponential scale-up in section In section POl using facetwise models 
of population-sizing for EDAs and niching methods, we propose a method to predict the growth 
rate of the number of sub-structures to circumvent the nicher from being overwhelmed and lead to 
polynomial scalability. Finally, we present key conclusions of the study. 



2 Background 

In this section we briefly review work on multiobjective evolutionary algorithms, specifically NSGA- 
II. We also outline previous work on multiobjective estimation of distribution algorithms and de- 
scribe two MOEDAs that we studied. 



2.1 Multiobjective Evolutionary Algorithms (MOEAs) 

Unlike traditional search methods genetic and evolutionary algorithms are naturally suited for mul- 
tiobjective optimization as they can process a number of solutions in parallel and find all or majority 



of the solutions in the Pareto-optimal front. Based on Goldberg's ( Goldberg, 1989 ) suggestion of im- 



plementing a selection procedure that uses the non-domination principle, many MOEAs have been 



proposed (Horn, Nafpliotis, & Goldberg, 1994; Srinivas & Deb, 1995 


Fonseca & Fleming, 1993 


Deb, Pratap, Agrawal, Hi Meyarivan, 2002J 


Zitzler, 


Laumanns, &: Thielc, 2001 


Gorne, Jerram, Knowles, &: Oates, 2001| 


|Erickson, Mayer, &: Horn, 2001 


Van Veldhuizen & Lamont, 2000| |Zydallis, Van Veldhuizen, k. 


Lamont, 2001|). A detailed survey 



of various MOEAs is out of the scope of this study and the interested users should refer to 
( Deb, 2001| Coello Coello, Van Veldhuizen, &: Lamont, 2 002) and the references therein. Since 



the selection and niching procedure of NSGA-II are used in the MOEDAs used in this study, we 
describe them in the following paragraphs. 

The selection procedure of NSGA-II consists of three elements: 
Non-Dominated sorting: Which sorts assigns domination ranks to individuals in the population 
based on their multiple objective values. A candidate solution X dominates Y, if X is no worse than 
Y in all objectives and if X is better than Y in at lease one objective. In non-dominated sorting, 
we start with the set of solutions that are not dominated by any solution in the population and 
assign them rank 1. Next, solutions that are not dominated by any of the remaining solutions are 
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assigned rank 2. That is, all solutions with rank 2 are dominated by at least one solution with 
rank 1, but are not dominated by others in the population. Thus the sorting and ranking process 
continues by assigning increasing ranks to those solutions that are not dominated by any of the 
remaining unranked solutions. After non-dominated sorting, we are left with subsets of population 
with different ranks. Solutions with a given rank are not dominated by solutions that have the 
same rank or higher and are dominated by at least one solution with a lower rank. Therefore, with 
respect to Pareto optimality, solutions with lower ranks should be given priority. 
Crowding distance computation: Apart from finding solutions in the Pareto front, it is also 
essential to achieve good coverage or spread of solutions in the front. The diversity of solutions in 
the objective space is usually maintained with a niching mechanism and NSGA-II uses crowding 
for doing so. Each solution in the population is assigned a crowding distance, which estimates how 
dense the non-dominated front is in the neighborhood of the solution. Therefore, the higher the 
crowding distance of the solution, the more diverse the solution is in the non-dominated front. The 
pseudocode for computing the crowding distance is outlined below: 

crowding_distance_computation(P) 
for rank r = 1 to R 

P r = subset of solutions in P with rank r 
n r = size(P r ) 
for i = 1 to n r 

d c (P r (i)) = 
for j = 1 to M 

Q r = sort P r using j th objective, f j . 
d c (Q r (l)) = d c (Q r (n r )) = oo 
for i = 2 to n r -l 

dist = Q r (i+l).fj - Q r (i-l).fj 
d c (Q r (i)) = d c (Q r (i)) + dist 

return d c 

where, P is the population, R is the maximum rank assigned in the population, M is the number 
of objective, and Q r (i) . f j is the value of j th objective of the i th individual. 

Individual comparison operator: NSGA-II uses a custom comparison operator to compare 
the quality of two solutions and to select the better individual. Both the rank and the crowding 
distance of the two solutions are used in the comparison operator, a pseudo-code of which is given 
below. First, the rank of the two individuals are considered and the solution with a lower rank is 
selected. If the two individuals have the same rank, then the solution with the highest crowding 
distance is selected. 

compare (X , Y) 

if rank(X) < rank(X) then return X 
if rank(X) > rank(Y) then return Y 
if rank(X) = rank(Y) 

if d c (X) > d c (Y) then return X 
if d c (X) < d c (Y) then return Y 
if d c (X) = d c (Y) 

then randomly choose either X or Y 
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2.2 Multiobjective Estimation of Distribution Algorithms (MOEDA) 



Similar to single-objective EDAs (Pelikan, Lobo, fc Goldberg, 2 002; L arranaga & Lozano, 2 002) 



multiobjective EDAs replace the variation operators of MOEAs with the probabilis- 
tic model building of promising solutions and sampling the model to generate new 
offspring. Recently, several MOEDAs have been proposed ( |Bosman Thierens, 20 02b 



Khan, Goldberg, Pelikan, 20021 |Ocenasek, 20021 |Khan, 2002] |Bosman fc Thierens, 2002a 



Laumanns &: Ocenasek, 2002| |Ahn, 2005] ) which have combined variants of the Bayesian optimiza- 



tion algorithm (BOA) flPelikan, Goldberg, Cantu-Paz, 2000D and iterated density estimation of 
algorithm (IDEA) ( |Bosman Thierens, 1999{ |Bosman &: Thierens, 2000| ) with the selection and 
replacement procedures of MOEAs. 

Khan et al ( Khan, Goldberg, Pelikan, 2002] |Khan, 2002 > proposed multiobjective BOA 



(mBOA) and multiobjective hierarchical BOA (mhBOA) by combining the model building and 
model sampling procedures of BOA and hierarchical BOA (hBOA) (Pelikan &: Goldberg, 2001) 



with the selection procedure of NSGA-II. They compared the performance of mBOA and mhBOA 
with those of NSGA-II on a class of boundedly-difncult additively-separable deceptive and hierarchi- 
cally deceptive functions. Laumanns and Ocenasek ( Laumanns &: Ocenasek, 2002| Ocenasek, 2 002) 



combined mixed BOA with the replacement procedure of strength Pareto evolutionary algorithm 
(SPEA2) ( |Zitzler, Laumanns, Thiele, 200l| ) and compared it with NSGA-II and SPEA2 on knap- 



sack problems. Ahn ( |Ahn, 20 05) combined real-coded BOA with selection procedure of NSGA-II 
with a sharing intensity measure and a modified crowding distance metric. 

Bosman and Thierens (Bosnian &: Thierens, 2002b| Bosman &: Thierens, 2002a ) combined 



IDEAs and mixed IDEAs with non-dominated tournament selection and clustering. In contrast 
to other MOEDAs, they used clustering to split the population into sub-population and separate 
models were built for each sub-population. They used the clustering procedure primarily to ob- 
tain a good model of the selected population, but recently it has been shown that it is one of 
the essential components in obtaining a scalable MOGA in general, and MOEDA in particular 
(Pelikan, Sastry, L Goldberg, 2005). 



In this study, we use mBOA and the multiobjective extended compact GA and test their 
scalability on a class of boundedly-difficult problems. The multiobjective extended compact genetic 
algorithm (meCGA) is similar to mBOA (Khan, Gold berg, &: Pelikan, 2002 ), except that the model 



building and sampling procedure of BOA is replaced with those of extended compact GA (eCGA) 



( Harik, 1 999). The meCGA is used in this study in part because the simplicity of the probabilistic 
model and its direct mapping to linkage groups makes it amenable to systematic analysis. The 
typical steps of meCGA can be outlined as follows: 

1. Initialization: The population is usually initialized with random individuals. However, other 
initialization procedures can also be used in a straightforward manner. 

2. Evaluation: The fitness or the quality-measure of the individuals are computed. 

3. Selection: As in mBOA, we use the selection procedure of NSGA-II. That is, we first perform 
the non-dominated sorting, and compute the crowding distance for all the individuals in the 
population. We then use the individual comparison operator to bias the generation of new 
individuals. 

4. Probabilistic model estimation: Unlike traditional GAs, however, EDAs assume a particular 
probabilistic model of the data, or a class of allowable models. A class-selection metric and a 
class-search mechanism is used to search for an optimum probabilistic model that represents 
the selected individuals. 



4 



Model representation: The probability distribution used in eCGA is a class of probability 
models known as marginal product models (MPMs). MPMs partition genes into mutually 
independent groups and specifies marginal probabilities for each linkage group. 

Class-Selection metric: To distinguish between better model instances from worse ones, 



eCGA uses a minimum description length (MDL) metric (Rissanen, 1978). The key concept 
behind MDL models is that all things being equal, simpler models are better than more 
complex ones. The MDL metric used in eCGA is a sum of two components: 



Model complexity which quantifies the model representation size in terms of number 
of bits required to store all the marginal probabilities: 



in 



C m = log 2 (n)^(2^-l). (1) 



i=l 

where n is the population size, m is the number of linkage groups, ki is the size of the 

jth g roU p 

• Compressed population complexity, which quantifies the data compression in terms 
of the entropy of the marginal distribution over all partitions. 

m 2 k i 

C p = nJ2^2 ~ Pi i log2 fc^' ) ' ( 2 ) 

i=l j=l 

where ptj is the frequency of the j th gene sequence of the genes belonging to the i th 
partition. 

Class-Search method: In eCGA, both the structure and the parameters of the model are 
searched and optimized to best fit the data. While the probabilities are learnt based on the 
variable instantiations in the population of selected individuals, a greedy-search heuristic is 
used to find an optimal or near-optimal probabilistic model. The search method starts by 
treating each decision variable as independent. The probabilistic model in this case is a vector 
of probabilities, representing the proportion of individuals among the selected individuals 
having a value '1' (or alternatively '0') for each variable. The model-search method continues 
by merging two partitions that yields greatest improvement in the model-metric score. The 
subset merges are continued until no more improvement in the metric value is possible. 

5. Offspring creation: New individuals are created by sampling the probabilistic model. The off- 
spring population are generated by randomly generating subsets from the current individuals 
according to the probabilities of the subsets as calculated in the probabilistic model. 

6. Replacement: We use two replacement techniques in this study: (1) Restricted tournament 
replacement (RTS) |Harik, 1 !)!).") ) in which offspring replaces the closest individual among 



w 

individuals randomly selected from the parent population, only if the offspring is better than 
the closest parent. (2) Elitist replacement used in NSGA-II, in which the parent and offspring 
population are combined. The domination ranks and crowding distances are computed on the 
combined population. Individuals with increasing ranks are gradually added starting from 
those with the lowest rank into the new population till its size reaches to n. However, if 
it is not possible to add all the solutions belonging to a particular rank without increasing 
the population size to greater than n, then individuals with greater crowding distances are 
preferred. 

7. Repeat steps 2-6 until one or more termination criteria are met. 
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3 Experiments 



As mentioned earlier, one of the purposes of this study is to investigate the scalability of MOEDAs, 
particularly meCGA and mBOA, on a class of boundedly-difficult multiobjective problems. In this 
section, we describe the test functions used to test the scalability and the methodology used in 
obtaining the empirical results. 



3.1 Test Problems 



Our approach in verifying the performance of sub-structural niching is to consider bounding ad- 
versarial problems that exploit one or more dimensions of problem difficulty (Goldberg, 2002). 
Particularly, we are interested in problems where building-block identification is critical for the GA 
success. Additionally, the problem solver (eCGA) should not have any knowledge of the building- 
block structure of the test problem, but should be known to researchers for verification purposes. 

One such class of problems is the m-k deceptive trap problem, which consists of additively 
separable deceptive functions ( Ackley, 1987} Goldberg, 1987} Deb Goldberg, 1992| ). Deceptive 
functions are designed to thwart the very mechanism of selectorecombinative search by punishing 
any localized hillclimbing and requiring mixing of whole building blocks at or above the order 
of deception. Using such adversarially designed functions is a stiff test — in some sense the stiffest 
test — of algorithm performance. The idea is that if an algorithm can beat an adversarially designed 
test function, it can solve other problems that are equally hard or easier than the adversarial 
function. 

In this study, we use a class of test problems with two objectives: (1) m-k deceptive trap, and 
(2) m-k deceptive inverse trap. String positions are first divided into disjoint subsets or partitions 
of k bits each. The fc-bit trap and inverse trap are defined as follows: 



trap k (u) 
invtrapk{u) 



1 

(l-d) 



(1-cQ 



fc-i 



if u = k 
otherwise 



u-l 



k-l 



if u = 
otherwise 



(3) 
(4) 



where u is the number of Is in the input string of k bits, and d is the signal difference. Here, we 
use k = 3, 4, and 5, and d = 0.9, 0.75, and 0.8 respectively. 

The m-k trap and inverse trap functions have conflicting objectives. Any solution that sets the 
bits in each partition either to 0s or Is is Pareto optimal and thus there are a total of 2 m solutions 
in the Pareto-optimal front with m + 1 distinct points in the objective space. We investigate the 
scalability of MOEAs and consider the population size and number of function evaluations required 
to maintain at least one copy of all the representative Pareto-optimal solutions. 

To illustrate, how additively decomposable problems with conflicting objectives can overwhelm 
the niching mechanism used in MOEAs — irrespective of linkage adaptation capabilities of the evo- 
lutionary algorithm — and lead to exponential scalability, we consider a problem where linkage 
learning is not required. Specifically, we consider the OneMax-ZeroMax problem which is similar 
to bicriteria OneMax problem used by Chen for developing facetwise models of population sizing 
and convergence time (Ghen, 2004). In OneMax-ZeroMax problem, the task is to maximize two 
objectives, one which is the sum of all the bits with value 1, and the other is the sum of all the bits 
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with value 0: 

I 

/OneMax(A) = ( 5 ) 
i=l 

I 

/ZeroMax(^) = ^(1 ~ (6) 
i=l 

where £ is the problem size, and Xj is the value of the i th bit of a candidate solution X. 
3.2 Methodology 

We tested the scalability of the following MOEAs: (1) Univariate marginal distribution algorithm 
(UMDA) dMiihlenbein fc Paafj, 1996] ), (2) NSGA-II with two-point crossover and bit-flip mutation, 
(3) meCGA, and (4) mhBOA. For each recombination operator, both elitist replacement of NSGA-II 
and restricted tournament replacement were used. 

For all test problems and algorithms, different problem sizes were examined to study scalability. 
For each problem type, problem size, and algorithm, a bisection method was used to determine a 
minimum population size to allocate at least one individual to each representative solution in the 
Pareto front. As mentioned earlier, for the test problems we consider in this study, for an £-bit 
problem — where ell = m ■ k — there are 2 m Pareto-optimal solutions with m + 1 distinct objective 
value pairs. In this study, we investigate the population size required to (1) find at least one copy 
of all the 2 m Pareto-optimal solution, and (2) find at least one copy of the m + 1 distinct points 
in the Pareto-optimal front. That is, we consider Pareto-optimal solutions with the same values of 
both objectives to be equivalent. 

The probability of maintaining at least one copy of all the representative Pareto-optimal solu- 
tions at a given population size is computed by averaging 10-30 independent MOEA runs. The 
minimum population size required to maintain at least one copy of all the representative solutions 
in the Pareto front are averaged over 10-30 independent bisection runs. Therefore, the results for 
each problem type, problem size, and algorithm correspond to 100-900 independent GA runs. The 
number of generations for UMDA, meCGA, and mhBOA was bounded by 51, whereas the runs 
with NSGA-II were given at most 101 or 20£ generations because of their slower convergence. 



4 Results 

We described the test problems and the experimental methodology used in testing the scalability 
of MOEDAs in the previous section. In this section we present the scalability results, followed by 
a demonstration of how multiobjective problems with conflicting sub-substructures can overwhelm 
the niching mechanism and lead to exponential scale-up. Finally, we use facetwise models of popula- 
tion sizing as dictated by model building, decision making, and supply ( Sastry &: Goldberg, 2 004 



Pclikan, Sastry, & Goldberg, 2003), and niching (Mahfoud, 1994) to estimate the growth rate of 



conflicting sub-structures that circumvents the niching method from being overwhelmed and leads 
to polynomial scalability. 

4.1 Scalability of MOEDAs 

We measure the algorithm performance in terms of minimum number of function evaluations re- 
quired to find and maintain at least one copy of all the representative Pareto-optimal solutions. 
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Figure 1: Scalability of meCGA with Crowding and with RTS for the m-3 deceptive trap and 
inverse trap with the problem size. Here, we plot the minimum number of function evaluations 
required to search and maintain at least one copy of (a) all the 2 m solutions in the Pareto-optimal 
front, and (b) only the m + 1 solutions in the Pareto-optimal front with different objective-value 
pairs. Here, we treat the gentotypcially (and phenotypically) different Pareto-optimal solution with 
same values in both objectives to be equivalent. 



Even though we have tried m-k deceptive trap and inverse trap functions for k = 3, 4, and 5, for 
brevity, we only show results for k = 3 in this paper. However, we note that the results for other 
values of k are qualitatively similar and those for k = 3 are representative of the behavior of the 
MOEAs. 

Figure ^a), shows the scalability of meCGA with the problem size for m-3 deceptive trap and 
inverse trap problem. We plot the minimum number of function evaluations required to allocate at 
least one copy of all the solutions in the Pareto-optimal front. As shown in the figure, all algorithms 
scale-up exponentially. The scale-up does not improve even if we restricted the requirement to 
finding only those m + 1 Pareto-optimal solutions with different objective- value pairs as shown in 
Figure ^b). That is, even if we consider genotypically (and phenotypically) distinct solutions that 
have the same value in both objectives to be equivalent, all the algorithms scale exponentially. 
This is despite the linkage information being identified correctly by meCGA and mhBOA and 
tight linkage assumption for NSGA-II. Additionally, the scalability does not improve if the niching 
or speciation is performed in the objective space (as in NSGA-II) or in the variable space (as in 
restricted tournament selection). 

Therefore, the exponential scale-up is not due to incorrect linkage identification and mixing 
( Goldberg, Thierens, Deb, 19931 |Thierens & Goldberg, 19931 |Thierens, 1999) , but because the 
niching mechanism gets quickly overwhelmed due to the exponential growth in the number of 
Pareto-optimal solutions. Furthermore, the distribution of the 2 m solutions in the Pareto-optimal 
front is not uniform. There are exponentially as many solutions in the middle of the front than at 
the edges (see table^). That is, there is only one solution — a binary string with all Os and all Is — at 

(TTl \ 
/2 J ~ ^ ( e?n ) genotypically 

different solutions in the middle of the Pareto-optimal front with same values in both objectives. 
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1 
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no,BBs 
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m — 1 




m — i 







# solutions 


1 


m 






( m \ 

v < y 




1 



Table 1: Distribution of genotypically and phenotypically different solutions in the Pareto-optimal 
front with same values in both objectives. ni^BBs refers to the number of fc-bit partitions (sub- 
structures) with Is and no^BBs is the number of fc-bit partitions with Os. 
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Figure 2: Probability of finding and maintaining different solutions on the Pareto-optimal for the 
10-3 deceptive trap and inverse trap problem as a function of population size. 



This highly non-linear distribution of solutions in the Pareto-front has two effects on the niching 
mechanisms used in MOEAs in general, and MOEDAs in particular: 

• Since the extremes of the Pareto-optimal front (maximizing most partitions or sub-structures 
with respect to one particular objective) has exponentially smaller representatives than 
in the middle, it takes exponentially longer time, or exponentially larger population size 
( Gold berg, 2002| Thierens, 1999 ) to search and maintain the solutions at the extremes of the 



Pareto-optimal front. When the population size is fixed, the probability of maintaining a 
solution in the middle of the Pareto-optimal front is higher than doing so in extremes of the 
front, as shown in figure El 

Since there are multiple points that are genotypically and phenotypically different, but lie on 
the same point on the Pareto-optimal front (the solutions have same values in both objectives), 
some of them vanish over time due to drift. The drift affects both the solutions in the middle 
and the near extremes of Pareto-optimal front. 
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Figure 3: Scalability of NSGA-II and UMDA on the OneMax-ZeroMax problem. Both algorithms 
with two different niching methods scale-up exponentially with the problem size. 



4.2 Overwhelming the Niching Method 

To better illustrate how sub-structure competition in all the partitions of a decomposable prob- 
lem can lead to nicher overwhelm, we use the OneMax-ZeroMax problem. We specifically choose 
the OneMax-ZeroMax problem to isolate the effect of linkage identification or lack there of from 
those of the niching methods on the scalability of the MOEAs. Unlike the m-k deceptive trap 
and inverse trap function, linkage identification is not necessary for the OneMax-ZeroMax prob- 
lem. Furthermore, both OneMax and ZeroMax problems are GA-easy problems which a simple 
selectorecombinative GA with uniform crossover and tournament selection can solve in linear time. 
In contrast, MOEAs scale-up exponentially in solving OneMax and ZeroMax as shown in figure El 
The results clearly indicate how the niching methods — both those that work in parameter space 
(RTS) and those that work in objective space (Crowding) — get overwhelmed due to exponentially 
large number of solutions in the Pareto-optimal front. Additionally, the results also show that 
even if the requirement is relaxed by treating all the different points that lie on the same point 
in the Pareto-optimal front to be equivalent, the scale-up does not improve. Finally, the results 
suggest that in decomposable problems, if all or majority of the sub-structures compete in the 
two objectives, then the niching method fails to maintain good coverage, leading to exponential 
scale-up. 

4.3 Circumventing the Burden on the Niching Method 

The results in the previous two sections clearly indicate that MOEDAs with either RTS or crowding 
mechanism of NSGA-II scale-up exponentially with problem size. We also demonstrated that the 
exponential scalability is due to the niching method being overwhelmed because of exponentially 
large number of solutions in the Pareto-optimal front. One way to circumvent the niching method 
from being overwhelmed is to control the growth rate of the number of sub-structures that compete 
in the two objectives, m^. That is, for a problem with m sub-structures, the two objectives compete 
in only sub-structures and share the same m — sub-structures. Since the total number of 
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Pareto-optimal solutions, n op t = 2 md , by controlling the number of competing sub-structures, we 
implicitly control the total number of Pareto-optimal solutions. 

The growth-rate of the competing sub-structures should be such that the effect of model ac- 
curacy, decision making, and sub-structure supply on the population sizing is dominant over the 
effect of niching on the population size. The effect of model accuracy, decision making and sub- 
structure supply on the population sizing of EDAs is given by flPelikan, Sastry, fc Goldberg, 2003 
Sastry & Goldberg, 2004| ): 



n eda oc ci • 2 • m log m. (7) 

The effect of niching method on the population-sizing of GAs was modeled by Mahfoud 
(Mahfoud, 1994) and is reproduced below: 



log[(l_ 7 iA) /nopt ] 

^niching , , . . , 

log [{n opt - 1) /n opt \ 



C2-1 m \ (8) 



where t is the number of generations we need to maintain all the niches. While Mahfoud derived 
the population-sizing estimate for fitness-sharing method, it is generally applicable to other niching 
methods and MOEAs as well flReed, 2002||Khan, 2002 ). 



To circumvent the niching method from being overwhelmed we require n e da > ^niching- That is, 

c 2 • 2 md > ci • 2 k ■ m log m. (9) 

The above equation can be approximated 1 to obtain a conservative estimate of the maximum num- 
ber of competing sub-structures that circumvent the niching mechanism from being overwhelmed 
is given by: 

m d k + log 2 (m) (10) 

The above growth rate is compared to empirical results for different values of k as a function of 
total number of sub-structures in the problem and the results are shown in figure El As shown in 
figure El the results indicate that once the growth-rate of competing sub-structures are controlled, 
the MOEDAs scale-up polynomially with the problem size, even on the OneMax-ZeroMax problem. 



5 Summary and Conclusions 

In this paper, we studied the scalability of multiobjective estimation of distribution algorithms 
(MOEDAs), specifically multiobjective extended compact genetic algorithm (meCGA) and multi- 
objective hierarchical Bayesian optimization algorithm (mhBOA), on a class of boundedly-difncult 
additively separable problems. We observed that even when the linkages were correctly identified, 
the MOEDAs scaled-up exponentially with problem size due to failure in the niching mechanisms. 
We demonstrated that even if the linkage is correctly identified, massive multimodality of the search 
problems can easily overwhelm the nicher and lead to exponential scale-up. That is, in decompos- 
able problems, if majority or all the sub-structures compete in different objectives, then the number 
of Pareto-optimal solutions increase exponentially. This exponential increase overwhelms the nicher 
and causes significant problems in maintaining a good coverage of the Pareto-optimal front. Finally, 
using facetwise models that incorporate the combined effects of model accuracy, decision making, 
and sub-structure supply, and the effect of niching on the population sizing, we proposed a growth 
rate of maximum number of sub-structures that can compete in the two objectives to circumvent 

1 we neglect the log 2 ( Cll ° sm ) term 
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Figure 4: The growth rate of number of sub-structures that compete in the two objectives for 
different values of k as a function of total number of sub-structures. 
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Figure 5: The scalability of meCGA with the crowding mechanism of NSGA-II and RTS niching 
for both OneMax-ZeroMax and m-3 deceptive trap and inverse trap problems. The growth rate of 
number of sub-structures that compete in the two objectives for a given problem size is controlled 
as given by equation 
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the failure of the niching method. Once the growth-rate of the number of Pareto-optimal solutions 
are controlled, the MOEDAs scale-up polynomially with the problem size. 
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